Issue#1307 json string read write#1310
Conversation
Corrects a typo and contraction in adv-matchtypes.md ("transfomration" -> "transformation", "dont" -> "don't"). Adds a footnote to field-definitions.md clarifying what Zingg Enterprise is and linking to the product comparison page for feature details.
* Fix typo and add Zingg Enterprise footnote
Corrects a typo and contraction in adv-matchtypes.md ("transfomration" -> "transformation", "dont" -> "don't"). Adds a footnote to field-definitions.md clarifying what Zingg Enterprise is and linking to the product comparison page for feature details.
* Merge remote-tracking branch 'upstream/main'
* Throw on invalid match type; update test to write args
Change MatchTypes to throw an Exception when a match type name is not found instead of returning null, preventing silent null returns. Update TestArguments to call argumentService.writeArguments(...) to persist the Arguments before reloading them
* Use IllegalArgumentException in getByName
updated throw IllegalArgumentException instead of the generic Exception.
* Add ClickHouse JDBC documentation * ix --------- Co-authored-by: Ishan Arora <ishanarora@Ishans-MacBook-Air.local>
Remove legacy Python export/peek model phases and related resources, update client wiring and argument handling. ZinggOptions (assess/peek/export) and the Spark Python phase runner mapping were removed; python/exportModel script, Python-related helpers (argument write/JSON methods) and many test/resources/docs for the peek/export flows were deleted
There was a problem hiding this comment.
Pull request overview
This PR removes legacy Python/CLI-era model phases (assess/peek/export) and related Python “Arguments JSON string/file” helpers, updating the Java client wiring accordingly, and cleans up associated tests/resources/docs. It also adds initial ClickHouse JDBC connector documentation and bumps the Docker image to Zingg 0.6.0 artifacts.
Changes:
- Removed deprecated phases (ASSESS_MODEL / PEEK_MODEL / EXPORT_MODEL) and their Spark executor wiring/resources.
- Removed Python
ArgumentsJSON-string/JSON-file helper APIs and theexportModelPython phase/script; updated argument writing to be file-based. - Added ClickHouse JDBC connector doc and updated Dockerfile to use Zingg 0.6.0 distribution.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
test/testFebrl/testArgs.py |
Removes Python tests for deprecated Arguments JSON helpers and peek-model phase default. |
spark/core/src/test/resources/testPeekModel/test.csv |
Deletes peek-model test dataset resource. |
spark/core/src/test/resources/testPeekModel/config.json |
Deletes peek-model test config resource. |
spark/core/src/test/java/zingg/spark/core/executor/TestPeekModel.java |
Removes obsolete peek-model integration test. |
spark/core/src/main/java/zingg/spark/core/executor/SparkZFactory.java |
Removes mapping for deprecated peek-model phase runner. |
spark/core/src/main/java/zingg/spark/core/executor/SparkPythonPhaseRunner.java |
Deletes generic Spark Python phase runner used by removed phases. |
spark/client/src/test/java/zingg/spark/client/TestArguments.java |
Updates test to write arguments to JSON file via writeArguments(...). |
python/zingg/client.py |
Removes Python-side Arguments JSON read/write/copy helpers; adjusts ClientOptions defaults/arg parsing. |
python/phases/exportModel.py |
Deletes legacy exportModel Python phase script. |
docs/setup/training/exportLabeledData.md |
Removes documentation page for exportModel flow. |
docs/setup/training/addOwnTrainingData.md |
Removes reference to exporting labeled data for reuse. |
docs/connectors/jdbc/clickhouse.md |
Adds ClickHouse JDBC connector configuration documentation. |
docs/SUMMARY.md |
Removes nav entry for exporting labeled data doc. |
common/core/src/test/resources/testPeekModel/test.csv |
Deletes common-core peek-model test dataset. |
common/core/src/test/resources/testPeekModel/config.json |
Deletes common-core peek-model test config. |
common/client/src/test/java/zingg/common/client/TestArguments.java |
Import ordering/assertion import tightening (no functional change). |
common/client/src/main/java/zingg/common/client/options/ZinggOptions.java |
Removes deprecated phase constants from supported options list. |
common/client/src/main/java/zingg/common/client/arguments/ArgumentServiceImpl.java |
Switches argument writing to WriterType.FILE (file-based writer). |
common/client/src/main/java/zingg/common/client/MatchTypes.java |
Changes invalid match type handling to throw IllegalArgumentException instead of returning null. |
common/client/src/main/java/zingg/common/client/Client.java |
Removes silent fallback to peek-model phase; logs and rethrows on invalid phase. |
Dockerfile |
Updates ZINGG_HOME and downloaded distribution to 0.6.0 (Spark build referenced updated). |
Comments suppressed due to low confidence (2)
python/zingg/client.py:752
ClientOptions.__init__usesprint(...)statements, which will pollute stdout for library consumers and test output. Since this is a library API, preferLOG.debug(...)(or remove these statements entirely) so output can be controlled via logging configuration.
print(argsSent)
if argsSent == None:
args = []
else:
args = argsSent.copy()
if self.PHASE not in args:
args.append(self.PHASE)
if self.LICENSE not in args:
args.append(self.LICENSE)
args.append("zinggLic.txt")
if self.EMAIL not in args:
args.append(self.EMAIL)
args.append("zingg@zingg.ai")
if self.CONF not in args:
args.append(self.CONF)
args.append("dummyConf.json")
print("arguments for client options are ", args)
self.co = getJVM().zingg.common.client.ClientOptions(args)
spark/client/src/test/java/zingg/spark/client/TestArguments.java:68
- This test now writes to a hard-coded path under
/tmpbut never cleans it up. That can make tests flaky on restricted CI runners and can leave artifacts behind. Prefer using a JUnit temp directory/file (e.g.,@TempDir) and deleting the file after the assertion, or useFiles.createTempFile(...)scoped to the test.
args.setCollectMetrics(true);
args.setModelId("500");
argumentService.writeArguments("/tmp/configFromArgObject.json", args);
//reload the same config file to check if deserialization is successful
IArguments newArgs = argumentService.loadArguments("/tmp/configFromArgObject.json");
assertEquals(newArgs.getModelId(), "500", "Model id is different");
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -711,29 +703,6 @@ def createArgumentsFromJSON(fileName, phase): | |||
| obj.args = getJVM().zingg.common.client.argumentst.ArgumentServiceImpl().loadArguments(fileName) | |||
There was a problem hiding this comment.
createArgumentsFromJSON is calling getJVM().zingg.common.client.argumentst.ArgumentServiceImpl(). The argumentst package looks like a typo and will fail at runtime when Python tries to load arguments from a JSON config. Please change it to the correct JVM path for ArgumentServiceImpl under zingg.common.client.arguments (and consider adding/adjusting a unit test that exercises this call).
| obj.args = getJVM().zingg.common.client.argumentst.ArgumentServiceImpl().loadArguments(fileName) | |
| obj.args = getJVM().zingg.common.client.arguments.ArgumentServiceImpl().loadArguments(fileName) |
| if self.PHASE not in args: | ||
| args.append(self.PHASE) |
There was a problem hiding this comment.
After removing the default phase value, ClientOptions.__init__ still appends --phase when it’s missing but does not append any phase value. This produces an invalid arg list and the JVM ClientOptions parser will throw “Missing argument for option '--phase'” for ClientOptions() and for any call site that doesn’t provide --phase explicitly (several tests construct ClientOptions() this way). Either stop auto-inserting --phase unless a value is also provided, or raise a clear Python-side error when --phase is omitted.
| if self.PHASE not in args: | |
| args.append(self.PHASE) |
| WORKDIR /zingg-0.5.0 | ||
| RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.5.0/zingg-0.5.0-spark-3.5.0.tar.gz | \ | ||
| WORKDIR /zingg-0.6.0 | ||
| RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.6.0.tar.gz | \ |
There was a problem hiding this comment.
The Docker base image is still apache/spark:3.5.0-python3, but the downloaded Zingg distribution is the ...-spark-3.6.0.tar.gz build. Mixing a Spark 3.6 build with a Spark 3.5 runtime can lead to binary incompatibilities at runtime. Please align these by either updating the base image to Spark 3.6 (if intended) or downloading the Spark 3.5 build of the Zingg tarball.
| RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.6.0.tar.gz | \ | |
| RUN curl --location https://github.com/zinggAI/zingg/releases/download/v0.6.0/zingg-0.6.0-spark-3.5.0.tar.gz | \ |
| * [Using Pre-existing Training Data](setup/training/addOwnTrainingData.md) | ||
| * [Updating Labeled Pairs](updatingLabels.md) | ||
| * [Documenting The Training Data](stepbystep/createtrainingdata/generatingdocumentation.md) | ||
| * [Exporting Labeled Data](setup/training/exportLabeledData.md) | ||
| * [Model Difference](stepbystep/createtrainingdata/modeldiff.md) | ||
| * [Ensuring Scalability](verifyBlocking.md) | ||
| * [Building And Saving The Model](setup/train.md) |
There was a problem hiding this comment.
A new connector doc was added at docs/connectors/jdbc/clickhouse.md, but it isn’t linked from docs/SUMMARY.md. With mdBook/GitBook-style navigation, unlisted pages are typically undiscoverable (and may not be built). Please add ClickHouse under the JDBC section (near the existing Postgres/MySQL entries) so the page is reachable.
|
Duplicate #1311 |
Remove legacy Python export/peek model phases and related resources, update client wiring and argument handling. ZinggOptions (assess/peek/export) and the Spark Python phase runner mapping were removed; python/exportModel script, Python-related helpers (argument write/JSON methods) and many test/resources/docs for the peek/export flows were deleted. fixes #1307 . fixes #1309